Training for Ocean Health Index regionaliziation: Canada, China, Israel, Baltic, S America
Topic from local Meetups: R, Data Science
Why I love Github, R and RStudio
June 25, 2014
Training for Ocean Health Index regionaliziation: Canada, China, Israel, Baltic, S America
Topic from local Meetups: R, Data Science
Why I love Github, R and RStudio
US West Coast Halpern et al (2014) PLoS ONE
Brazil Elfes et al (2014) PLoS ONE
Flower
Map
For example, calculate Baltic Health Index every year using scenarios bhi1980,..., bhi2014 as folders.
library(ohicore)
for (dir_scenario in sprintf('~/ohibaltic/bhi%d', 1980:2014)){
setwd(dir_scenario)
conf = Conf('conf')
layers = Layers('layers.csv', 'layers')
scores = CalculateAll(conf, layers)
write.csv(scores, 'scores.csv')
}
free, cross-platform, open source, web based:
shiny web application, ggplot2 figures, dplyr data manipulation
ohiprep | ohi-[scenario] | ohicore
ohicorelibrary(devtools)
install_github('ohi-science/ohicore')
ohi-globalDownload ZIP, Clone, Fork
github.com/[org]/[repo] (org web) |
github.com/[user]/[repo] (user web) |
~/github/[repo] (user local) |
|
|---|---|---|---|
| ->1x | -> fork | -> clone | |
| <- | merge pull request {admin} <- | <- pull request | <- push, <-> commit |
where:
[org] is an organization (eg ohi-science)[repo] is a repository in the orgranization (eg ohiprep)[user] is your github username (eg bbest)Track Changes View with "Rendered" button to view differences between versions of a text file: additions in green, removals in red strikethrough.
CSV View allows for on the fly tabular view, searching for text, and linking to specific rows of data.
Geographic View of GeoJSON renders automatically as a map.
markdown is a plain text formatting syntax for conversion to HTML (with a tool)
r markdown enables easy authoring of reproducible web reports from R
in rstudio
chunks: text, tables, figures
inline: pi=`r pi` evaluates to "pi=3.1416"
inline
The Arithmetic mean is equal to $\frac{1}{n} \sum_{i=i}^{n} x_{i}$, or the summation of n numbers divided by n.
The Arithmetic mean is equal to \(\frac{1}{n} \sum_{i=i}^{n} x_{i}\), or the summation of n numbers divided by n.
chunked
$$
\frac{1}{n} \sum_{i=i}^{n} x_{i}
$$
\[ \frac{1}{n} \sum_{i=i}^{n} x_{i} \]
post to rpubs from rstudio
this presentation http://rpubs.com/bdbest/rprod
more on Authoring R Presentations
post to github
natively renders markdown (*.md)
easy to see change in simple text files (vs binary / proprietary formats)
RStudio: File > New Project > Version Control
clone
commit and push
dplyr is the next iteration of plyr, focussed on tools for working with data frames.
Calculate the batting average (AVG): number of base hits (H) divided by the total number of at bats (AB) using the Lahman baseball database. Limit to Babe Ruth and Jackie Robinson.
Setup
library(Lahman) library(dplyr) library(RSQLite)
Answer
nameFirst nameLast avg
Babe Ruth 0.323
Jackie Robinson 0.308
sql
tbl(lahman_sqlite(), sql( "SELECT nameFirst, nameLast, ROUND(AVG(H/(AB*1.0)), 3) AS avg FROM Batting JOIN Master USING (playerID) WHERE AB > 0 AND (( (nameFirst = 'Babe' AND nameLast = 'Ruth') OR (nameFirst = 'Jackie' AND nameLast = 'Robinson')) GROUP BY nameFirst, nameLast ORDER BY avg DESC")))
Chaining (%.%): grammar of data manipulation
Batting %.%
merge(Master, by='playerID') %.%
filter(
AB > 0 &
(nameFirst=='Babe' &
nameLast =='Ruth') |
(nameFirst=='Jackie' &
nameLast =='Robinson')) %.%
group_by(nameFirst, nameLast) %.%
summarise(avg = round(mean(H/AB), 3)) %.%
arrange(desc(avg))